Contextual Max Pooling for Human Action Recognition
نویسندگان
چکیده
منابع مشابه
Eigen Evolution Pooling for Human Action Recognition
We introduce Eigen Evolution Pooling, an efficient method to aggregate a sequence of feature vectors. Eigen evolution pooling is designed to produce compact feature representations for a sequence of feature vectors, while maximally preserving as much information about the sequence as possible, especially the temporal evolution of the features over time. Eigen evolution pooling is a general pool...
متن کاملAttentional Pooling for Action Recognition
We introduce a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks. Our proposed attention module can be trained with or without extra supervision, and gives a sizable boost in accuracy while keeping the network size and computational cost nearly the same. It leads to significant improvements over state of the art base archite...
متن کاملAdaptive Structured Pooling for Action Recognition
where s ∈ S k and Ψs(p) = 1 if p ∈ s and Ψs(p) = 0 otherwise. M t k is L1-normalized and square-rooted. For a video of T frames: Mk(x, y, t) = { M k (x, y) . . .M T k (x, y) } For each feature xm ∈ X , with (xxm , yxm , txm) as spatiotemporal coordinates of its centroid, weight w m as a local integral of the pooling map Mk: w m = ∫ xxm+vx xxm−vx ∫ yxm+vy yxm−vy ∫ txm+vt txm−vt Mk(x, y, t) dx dy...
متن کاملSecond-order Temporal Pooling for Action Recognition
Most successful deep learning models for action recognition generate predictions for short video clips, which are later aggregated into a longer time-frame action descriptor by computing a statistic over these predictions. Zeroth (max) or first order (average) statistic are commonly used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel e...
متن کاملTemporal Selective Max Pooling Towards Practical Face Recognition
In this report, we deal with two challenges when building a real-world face recognition system the pose variation in uncontrolled environment and the computational expense of processing a video stream. First, we argue that the frame-wise feature mean is unable to characterize the variation among frames. We propose to preserve the overall pose diversity if we want the video feature to represent ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEICE Transactions on Information and Systems
سال: 2015
ISSN: 0916-8532,1745-1361
DOI: 10.1587/transinf.2014edl8221